InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling
نویسندگان
چکیده
Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics. However, most existing methods suffer from producing repetitive topics that hinder further and performance decline caused low-coverage dictionaries. In this paper, we propose the Topic Modeling with Mutual Information (InfoCTM). Instead of direct alignment in previous work, a mutual information method. This works as regularization to properly align prevent degenerate representations words, which mitigates issue. To address dictionary issue, vocabulary linking method finds more linked words beyond translations given dictionary. Extensive experiments on English, Chinese, Japanese datasets demonstrate our outperforms state-of-the-art baselines, coherent, diverse, well-aligned showing better transferability classification tasks.
منابع مشابه
Cross-Lingual Latent Topic Extraction
Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other. In this paper, we propose a way to ...
متن کاملCross - lingual Information Retrieval Model based on Bilingual Topic Correlation ⋆
How to construct relationship between bilingual texts is important to effectively processing multi-lingual text data and cross language barriers. Cross-lingual latent semantic indexing (CL-LSI) corpus-based doesnot fully take into account bilingual semantic relationship. The paper proposes a new model building semantic relationship of bilingual parallel document via partial least squares (PLS)....
متن کاملAlignment by Maximization of Mutual Information Alignment B Y Maximization of Mutual Information
A new information-theoretic approach is presented for nding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation, few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and can ...
متن کاملMultilingual and cross-lingual news topic tracking
We are presenting a working system for automated news analysis that ingests an average total of 7600 news articles per day in five languages. For each language, the system detects the major news stories of the day using a group-average unsupervised agglomerative clustering process. It also tracks, for each cluster, related groups of articles published over the previous seven days, using a cosin...
متن کاملStatistical mechanics of mutual information maximization
– An unsupervised learning procedure based on maximizing the mutual information between the outputs of two networks receiving different but statistically dependent inputs is analyzed (Becker S. and Hinton G., Nature, 355 (1992) 161). By exploiting a formal analogy to supervised learning in parity machines, the theory of zero-temperature Gibbs learning for the unsupervised procedure is presented...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i11.26612